Journal of Educational Evaluation for Health Professions

Lazar Stevanović

1 Article

Efficacy and limitations of ChatGPT as a biostatistical problem-solving tool in medical education in Serbia: a descriptive study: Aleksandra Ignjatović, Lazar Stevanović; J Educ Eval Health Prof. 2023;20:28. Published online October 16, 2023; DOI: https://doi.org/10.3352/jeehp.2023.20.28

1,810 View
167 Download
1 Web of Science
2 Crossref

Purpose
This study aimed to assess the performance of ChatGPT (GPT-3.5 and GPT-4) as a study tool in solving biostatistical problems and to identify any potential drawbacks that might arise from using ChatGPT in medical education, particularly in solving practical biostatistical problems.
Methods
ChatGPT was tested to evaluate its ability to solve biostatistical problems from the Handbook of Medical Statistics by Peacock and Peacock in this descriptive study. Tables from the problems were transformed into textual questions. Ten biostatistical problems were randomly chosen and used as text-based input for conversation with ChatGPT (versions 3.5 and 4).
Results
GPT-3.5 solved 5 practical problems in the first attempt, related to categorical data, cross-sectional study, measuring reliability, probability properties, and the t-test. GPT-3.5 failed to provide correct answers regarding analysis of variance, the chi-square test, and sample size within 3 attempts. GPT-4 also solved a task related to the confidence interval in the first attempt and solved all questions within 3 attempts, with precise guidance and monitoring.
Conclusion
The assessment of both versions of ChatGPT performance in 10 biostatistical problems revealed that GPT-3.5 and 4’s performance was below average, with correct response rates of 5 and 6 out of 10 on the first attempt. GPT-4 succeeded in providing all correct answers within 3 attempts. These findings indicate that students must be aware that this tool, even when providing and calculating different statistical analyses, can be wrong, and they should be aware of ChatGPT’s limitations and be careful when incorporating this model into medical education.

Citations

Citations to this article as recorded by

Can Generative AI and ChatGPT Outperform Humans on Cognitive-Demanding Problem-Solving Tasks in Science?
Xiaoming Zhai, Matthew Nyaaba, Wenchao Ma
Science & Education.2024;[Epub] CrossRef
Opportunities, challenges, and future directions of large language models, including ChatGPT in medical education: a systematic scoping review
Xiaojun Xu, Yixiao Chen, Jing Miao
Journal of Educational Evaluation for Health Professions.2024; 21: 6. CrossRef

Author index